human judgment
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Arizona (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada (0.04)
- Europe > Ireland (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (4 more...)
- North America > United States > Alaska > North Slope Borough > Utqiagvik (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States (0.28)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Japan (0.04)
- Asia > China (0.04)
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Jordan (0.05)
- (9 more...)
Human Expertise in Algorithmic Prediction
We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach leverages human judgment to distinguish inputs which are, or look the same to predictive algorithms. We argue that this framing clarifies the problem of human-AI collaboration in prediction tasks, as experts often form judgments by drawing on information which is not encoded in an algorithm's training data. Algorithmic indistinguishability yields a natural test for assessing whether experts incorporate this kind of side information, and further provides a simple but principled method for selectively incorporating human feedback into algorithmic predictions. We show that this method provably improves the performance of any feasible algorithmic predictor and precisely quantify this improvement. We find empirically that although algorithms often outperform their human counterparts, human judgment can improve algorithmic predictions on instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Existing automatic evaluation on text-to-image synthesis can only provide an image-text matching score, without considering the object-level compositionality, which results in poor correlation with human judgments. In this work, we propose LLMScore, a new framework that offers evaluation scores with multi-granularity compositionality. LLMScore leverages the large language models (LLMs) to evaluate text-to-image models. Initially, it transforms the image into image-level and object-level visual descriptions. Then an evaluation instruction is fed into the LLMs to measure the alignment between the synthesized image and the text, ultimately generating a score accompanied by a rationale. Our substantial analysis reveals the highest correlation of LLMScore with human judgments on a wide range of datasets (Attribute Binding Contrast, Concept Conjunction, MSCOCO, DrawBench, PaintSkills). Notably, our LLMScore achieves Kendall's tau correlation with human evaluations that is 58.8% and 31.2%